49 research outputs found

    Deep learning models for predicting RNA degradation via dual crowdsourcing

    Get PDF
    Medicines based on messenger RNA (mRNA) hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition (‘Stanford OpenVaccine’) on Kaggle, involving single-nucleotide resolution measurements on 6,043 diverse 102–130-nucleotide RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504–1,588 nucleotides) with improved accuracy compared with previously published models. These results indicate that such models can represent in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for dataset creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales

    Deep learning models for predicting RNA degradation via dual crowdsourcing

    Get PDF
    Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ("Stanford OpenVaccine") on Kaggle, involving single-nucleotide resolution measurements on 6043 102-130-nucleotide diverse RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1588 nucleotides) with improved accuracy compared to previously published models. Top teams integrated natural language processing architectures and data augmentation techniques with predictions from previous dynamic programming models for RNA secondary structure. These results indicate that such models are capable of representing in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for data set creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales

    A community-powered search of machine learning strategy space to find NMR property prediction models

    Get PDF
    The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published "in-house" efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties

    A deep learning approach to photo–identification demonstrates high performance on two dozen cetacean species

    Get PDF
    We thank the countless individuals who collected and/or processed the nearly 85,000 images used in this study and those who assisted, particularly those who sorted these images from the millions that did not end up in the catalogues. Additionally, we thank the other Kaggle competitors who helped develop the ideas, models and data used here, particularly those who released their datasets to the public. The graduate assistantship for Philip T. Patton was funded by the NOAA Fisheries QUEST Fellowship. This paper represents HIMB and SOEST contribution numbers 1932 and 11679, respectively. The technical support and advanced computing resources from University of Hawaii Information Technology Services—Cyberinfrastructure, funded in part by the National Science Foundation CC* awards # 2201428 and # 2232862 are gratefully acknowledged. Every photo–identification image was collected under permits according to relevant national guidelines, regulation and legislation.Peer reviewedPublisher PD

    Transcriptional recapitulation and subversion of embryonic colon development by mouse colon tumor models and human colon cancer

    Get PDF
    Colon tumors from four independent mouse models and 100 human colorectal cancers all exhibited striking recapitulation of embryonic colon gene expression from embryonic days 13.5-18.5

    Transcriptional recapitulation and subversion of embryonic colon development by mouse colon tumor models and human colon cancer

    Get PDF
    Abstract Background The expression of carcino-embryonic antigen by colorectal cancer is an example of oncogenic activation of embryonic gene expression. Hypothesizing that oncogenesis-recapitulating-ontogenesis may represent a broad programmatic commitment, we compared gene expression patterns of human colorectal cancers (CRCs) and mouse colon tumor models to those of mouse colon development embryonic days 13.5-18.5. Results We report here that 39 colon tumors from four independent mouse models and 100 human CRCs encompassing all clinical stages shared a striking recapitulation of embryonic colon gene expression. Compared to normal adult colon, all mouse and human tumors over-expressed a large cluster of genes highly enriched for functional association to the control of cell cycle progression, proliferation, and migration, including those encoding MYC, AKT2, PLK1 and SPARC. Mouse tumors positive for nuclear β-catenin shifted the shared embryonic pattern to that of early development. Human and mouse tumors differed from normal embryonic colon by their loss of expression modules enriched for tumor suppressors (EDNRB, HSPE, KIT and LSP1). Human CRC adenocarcinomas lost an additional suppressor module (IGFBP4, MAP4K1, PDGFRA, STAB1 and WNT4). Many human tumor samples also gained expression of a coordinately regulated module associated with advanced malignancy (ABCC1, FOXO3A, LIF, PIK3R1, PRNP, TNC, TIMP3 and VEGF). Conclusion Cross-species, developmental, and multi-model gene expression patterning comparisons provide an integrated and versatile framework for definition of transcriptional programs associated with oncogenesis. This approach also provides a general method for identifying pattern-specific biomarkers and therapeutic targets. This delineation and categorization of developmental and non-developmental activator and suppressor gene modules can thus facilitate the formulation of sophisticated hypotheses to evaluate potential synergistic effects of targeting within- and between-modules for next-generation combinatorial therapeutics and improved mouse models

    Outcomes of Mechanical Thrombectomy for Patients With Stroke Presenting With Low Alberta Stroke Program Early Computed Tomography Score in the Early and Extended Window

    Get PDF
    Importance: Limited data are available about the outcomes of mechanical thrombectomy (MT) for real-world patients with stroke presenting with a large core infarct. Objective: To investigate the safety and effectiveness of MT for patients with large vessel occlusion and an Alberta Stroke Program Early Computed Tomography Score (ASPECTS) of 2 to 5. Design, setting, and participants: This retrospective cohort study used data from the Stroke Thrombectomy and Aneurysm Registry (STAR), which combines the prospectively maintained databases of 28 thrombectomy-capable stroke centers in the US, Europe, and Asia. The study included 2345 patients presenting with an occlusion in the internal carotid artery or M1 segment of the middle cerebral artery from January 1, 2016, to December 31, 2020. Patients were followed up for 90 days after intervention. The ASPECTS is a 10-point scoring system based on the extent of early ischemic changes on the baseline noncontrasted computed tomography scan, with a score of 10 indicating normal and a score of 0 indicating ischemic changes in all of the regions included in the score. Exposure: All patients underwent MT in one of the included centers. Main outcomes and measures: A multivariable regression model was used to assess factors associated with a favorable 90-day outcome (modified Rankin Scale score of 0-2), including interaction terms between an ASPECTS of 2 to 5 and receiving MT in the extended window (6-24 hours from symptom onset). Results: A total of 2345 patients who underwent MT were included (1175 women [50.1%]; median age, 72 years [IQR, 60-80 years]; 2132 patients [90.9%] had an ASPECTS of ≥6, and 213 patients [9.1%] had an ASPECTS of 2-5). At 90 days, 47 of the 213 patients (22.1%) with an ASPECTS of 2 to 5 had a modified Rankin Scale score of 0 to 2 (25.6% [45 of 176] of patients who underwent successful recanalization [modified Thrombolysis in Cerebral Ischemia score ≥2B] vs 5.4% [2 of 37] of patients who underwent unsuccessful recanalization; P = .007). Having a low ASPECTS (odds ratio, 0.60; 95% CI, 0.38-0.85; P = .002) and presenting in the extended window (odds ratio, 0.69; 95% CI, 0.55-0.88; P = .001) were associated with worse 90-day outcome after controlling for potential confounders, without significant interaction between these 2 factors (P = .64). Conclusions and relevance: In this cohort study, more than 1 in 5 patients presenting with an ASPECTS of 2 to 5 achieved 90-day functional independence after MT. A favorable outcome was nearly 5 times more likely for patients with low ASPECTS who had successful recanalization. The association of a low ASPECTS with 90-day outcomes did not differ for patients presenting in the early vs extended MT window
    corecore